探究网络延迟对事务的影响
* GreatSQL社区原创内容未经授权不得随意使用,转载请联系小编并注明来源。
1.背景概述
最近在做数据同步测试,需要通过DTS将kafka中的数据同步到数据库中,4G的数据量同步到数据库用了大约4个多小时,这看起来很不合理。
查看数据库所在主机的CPU,IO的使用率都不高,没有瓶颈。
最后通过排查发现由于kafka 和DTS 以及数据库实例不再同一个机房,网络延迟较大,导致同步速率缓慢。
将kafka、DTS、数据库实例部署到同一个机房后,同步速度明显提升,只需要15分钟就能同步完。
最近在做数据同步测试,需要通过DTS将kafka中的数据同步到数据库中,4G的数据量同步到数据库用了大约4个多小时,这看起来很不合理。
查看数据库所在主机的CPU,IO的使用率都不高,没有瓶颈。
最后通过排查发现由于kafka 和DTS 以及数据库实例不再同一个机房,网络延迟较大,导致同步速率缓慢。
将kafka、DTS、数据库实例部署到同一个机房后,同步速度明显提升,只需要15分钟就能同步完。
2.问题复现
本次测试通过sysbench在不同网络延迟的情况下,进行数据写入及性能压测,对比网络延迟对数据库事务的影响。
本次测试通过sysbench在不同网络延迟的情况下,进行数据写入及性能压测,对比网络延迟对数据库事务的影响。
2.1 查看当前网络延迟
$ ping 192.168.137.162
PING 192.168.137.162 (192.168.137.162) 56(84) bytes of data.
64 bytes from 192.168.137.162: icmp_seq=1 ttl=64 time=0.299 ms
64 bytes from 192.168.137.162: icmp_seq=2 ttl=64 time=0.180 ms
64 bytes from 192.168.137.162: icmp_seq=3 ttl=64 time=0.297 ms
64 bytes from 192.168.137.162: icmp_seq=4 ttl=64 time=0.329 ms
64 bytes from 192.168.137.162: icmp_seq=5 ttl=64 time=0.263 ms
64 bytes from 192.168.137.162: icmp_seq=6 ttl=64 time=0.367 ms
64 bytes from 192.168.137.162: icmp_seq=7 ttl=64 time=0.237 ms
64 bytes from 192.168.137.162: icmp_seq=8 ttl=64 time=0.160 ms
64 bytes from 192.168.137.162: icmp_seq=9 ttl=64 time=0.180 ms
64 bytes from 192.168.137.162: icmp_seq=10 ttl=64 time=0.257 ms
当前2台主机在同一个机房,网络延迟大约在 0.3ms 左右
$ ping 192.168.137.162
PING 192.168.137.162 (192.168.137.162) 56(84) bytes of data.
64 bytes from 192.168.137.162: icmp_seq=1 ttl=64 time=0.299 ms
64 bytes from 192.168.137.162: icmp_seq=2 ttl=64 time=0.180 ms
64 bytes from 192.168.137.162: icmp_seq=3 ttl=64 time=0.297 ms
64 bytes from 192.168.137.162: icmp_seq=4 ttl=64 time=0.329 ms
64 bytes from 192.168.137.162: icmp_seq=5 ttl=64 time=0.263 ms
64 bytes from 192.168.137.162: icmp_seq=6 ttl=64 time=0.367 ms
64 bytes from 192.168.137.162: icmp_seq=7 ttl=64 time=0.237 ms
64 bytes from 192.168.137.162: icmp_seq=8 ttl=64 time=0.160 ms
64 bytes from 192.168.137.162: icmp_seq=9 ttl=64 time=0.180 ms
64 bytes from 192.168.137.162: icmp_seq=10 ttl=64 time=0.257 ms
当前2台主机在同一个机房,网络延迟大约在 0.3ms 左右
2.2 (正常延迟)通过sysbench写入数据
2.2.1 创建一张表写入500W条数据
$ time sysbench lua/oltp_read_write.lua --mysql-db=sysbench --mysql-host=192.168.137.162 --mysql-port=3307 --mysql-user=root --mysql-password=greatdb --tables=1 --table_size=5000000 --report-interval=2 --threads=10 --time=600 --mysql-ignore-errors=all prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 5000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
real1m56.459s
user0m7.187s
sys0m0.400s
写入 500w 数据量耗时 1m56s
2.2.2 sysbench 压测3分钟
SQL statistics:
queries performed:
read: 1711374
write: 488964
other: 244482
total: 2444820
transactions: 122241 (407.37 per sec.)
queries: 2444820 (8147.45 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
Throughput:
events/s (eps): 407.3725
time elapsed: 300.0718s
total number of events: 122241
Latency (ms):
min: 10.68
avg: 122.72
max: 1267.88
95th percentile: 502.20
sum: 15000894.94
Threads fairness:
events (avg/stddev): 2444.8200/14.99
execution time (avg/stddev): 300.0179/0.02
可以看到 TPS:407.37,QPS:8147.45。
$ time sysbench lua/oltp_read_write.lua --mysql-db=sysbench --mysql-host=192.168.137.162 --mysql-port=3307 --mysql-user=root --mysql-password=greatdb --tables=1 --table_size=5000000 --report-interval=2 --threads=10 --time=600 --mysql-ignore-errors=all prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 5000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
real1m56.459s
user0m7.187s
sys0m0.400s
写入 500w 数据量耗时 1m56s
2.2.2 sysbench 压测3分钟
SQL statistics:
queries performed:
read: 1711374
write: 488964
other: 244482
total: 2444820
transactions: 122241 (407.37 per sec.)
queries: 2444820 (8147.45 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
Throughput:
events/s (eps): 407.3725
time elapsed: 300.0718s
total number of events: 122241
Latency (ms):
min: 10.68
avg: 122.72
max: 1267.88
95th percentile: 502.20
sum: 15000894.94
Threads fairness:
events (avg/stddev): 2444.8200/14.99
execution time (avg/stddev): 300.0179/0.02
可以看到 TPS:407.37,QPS:8147.45。
2.3通过tc命令模拟网络延迟
tc命令是Linux系统中的一个网络管理工具,用于配置和管理网络流量控制。它可以用来限制网络带宽、延迟、丢包等,以及实现QoS(Quality of Service)等功能。
# 对ens3网卡进行延迟设置,设置延迟为10ms
tc qdisc add dev ens3 root netem delay 10ms
如果在使用tc命令时报错如下错误,可以升级一下内核模块
# 报错
tc qdisc add dev ens3 root netem delay 10ms
Error: Specified qdisc not found.
# 升级
$ yum install kernel-modules-extra*
# 重启主机
$ reboot
tc命令是Linux系统中的一个网络管理工具,用于配置和管理网络流量控制。它可以用来限制网络带宽、延迟、丢包等,以及实现QoS(Quality of Service)等功能。
# 对ens3网卡进行延迟设置,设置延迟为10ms
tc qdisc add dev ens3 root netem delay 10ms
如果在使用tc命令时报错如下错误,可以升级一下内核模块
# 报错
tc qdisc add dev ens3 root netem delay 10ms
Error: Specified qdisc not found.
# 升级
$ yum install kernel-modules-extra*
# 重启主机
$ reboot
2.4查看当前网络延迟
$ ping 192.168.137.162
PING 192.168.137.162 (192.168.137.162) 56(84) bytes of data.
64 bytes from 192.168.137.162: icmp_seq=1 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=2 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=3 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=4 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=5 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=6 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=7 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=8 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=9 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=10 ttl=64 time=10.2 ms
网络延迟大约为 10ms
$ ping 192.168.137.162
PING 192.168.137.162 (192.168.137.162) 56(84) bytes of data.
64 bytes from 192.168.137.162: icmp_seq=1 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=2 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=3 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=4 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=5 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=6 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=7 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=8 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=9 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=10 ttl=64 time=10.2 ms
网络延迟大约为 10ms
2.3 (延迟10ms)通过sysbench写入数据
2.3.1 创建一张表写入500W条数据
$ time sysbench lua/oltp_read_write.lua --mysql-db=sysbench --mysql-host=192.168.137.162 --mysql-port=3307 --mysql-user=root --mysql-password=greatdb --tables=1 --table_size=5000000 --report-interval=2 --threads=10 --time=600 --mysql-ignore-errors=all prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 5000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
real2m11.656s
user0m7.314s
sys0m0.470s
写入 500w 数据量耗时 2m11s
$ time sysbench lua/oltp_read_write.lua --mysql-db=sysbench --mysql-host=192.168.137.162 --mysql-port=3307 --mysql-user=root --mysql-password=greatdb --tables=1 --table_size=5000000 --report-interval=2 --threads=10 --time=600 --mysql-ignore-errors=all prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 5000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
real2m11.656s
user0m7.314s
sys0m0.470s
写入 500w 数据量耗时 2m11s
2.3.2 sysbench 压测3分钟
SQL statistics:
queries performed:
read: 788214
write: 225204
other: 112602
total: 1126020
transactions: 56301 (187.41 per sec.)
queries: 1126020 (3748.16 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
Throughput:
events/s (eps): 187.4079
time elapsed: 300.4196s
total number of events: 56301
Latency (ms):
min: 210.14
avg: 266.68
max: 493.91
95th percentile: 419.45
sum: 15014235.80
Threads fairness:
events (avg/stddev): 1126.0200/1.16
execution time (avg/stddev): 300.2847/0.16
可以看到 TPS:187.41,QPS:3748.16,是没有延迟时的46%,连一半都没到。
SQL statistics:
queries performed:
read: 788214
write: 225204
other: 112602
total: 1126020
transactions: 56301 (187.41 per sec.)
queries: 1126020 (3748.16 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
Throughput:
events/s (eps): 187.4079
time elapsed: 300.4196s
total number of events: 56301
Latency (ms):
min: 210.14
avg: 266.68
max: 493.91
95th percentile: 419.45
sum: 15014235.80
Threads fairness:
events (avg/stddev): 1126.0200/1.16
execution time (avg/stddev): 300.2847/0.16
可以看到 TPS:187.41,QPS:3748.16,是没有延迟时的46%,连一半都没到。
3.总结
通过上面的测试可以看出网络延迟较大时,对数据的写入及每秒执行的事务数都有较大影响。
如果需要做性能测试及数据同步,尽量将压测工具或同步工具部署在同一个机房,避免网络延迟较大,对测试结果有影响。
通过上面的测试可以看出网络延迟较大时,对数据的写入及每秒执行的事务数都有较大影响。
如果需要做性能测试及数据同步,尽量将压测工具或同步工具部署在同一个机房,避免网络延迟较大,对测试结果有影响。
《深入浅出MGR》视频课程
戳此小程序即可直达B站
https://www.bilibili.com/medialist/play/1363850082?business=space_collection&business_id=343928&desc=0
文章推荐:
提示词:网络延迟
想看更多技术好文,点个“在看”吧!